The DSE determines how the document's location must be specified and whether or not the document is converted. The content type specified for a document determines whether or not the document is indexed.
A content collection makefile must specify the content-type
attribute
for a document as well as the content-type
output by the DSE. If
you specify a content-type
that the server does not index, the
NXT 4 server stores and retrieves the document as binary data.
The NXT 4 server indexes documents of the following content-type (MIME types):
text/html
text/xml
text/plain
application/pdf
application/msexcel
application/msword
application/mspowerpoint
application/x-html-body-text
application/vnd.oasis.opendocument.text
Note:The NXT 4 server uses the IFilterCOM interface to extract terms from Microsoft Office and Adobe Acrobat PDF files.
Note: To index ODT documents, you must install the corresponding version of OpenOffice IFilters that is included in the Apache OpenOffice 4.0 and higher, or similar software (for example, LibreOffice 4.3 and higher). You can download and install the required software manually from the official site.
The NXT 4 server stores documents in their native format which avoids translation mistakes that often occur when converting data between formats while storing and retrieving documents. However, storing a document in its native format also requires that the browser have a plug-in to display the format.
The NXT 4 server does not alter a document when storing or retrieving it but you can use a DSE to modify the format of a document before it is stored in a content collection.
Before you can use a DSE, you must define a dse element for it. The class-id
attribute specifies the COM class ID of the DSE to use. The class IDs of the
NXT DSEs are specified in makefile.dtd
.
In addition to the dse elements defined in makefile.dtd
, you can
define other dse elements that use these DSEs. Each definition must have a
unique id (name). This allows you to specify different parameters for the DSEs
in each dse element definition.
After you define a dse element, you can use the DSE by specifying the ID in the
dse
attribute of a
document
element. You may want to use certain options for
one set of files and different options for another set of files. To make this
possible, you should define two different dse
elements, each
specifying different options. For each document, specify the ID of the dse
defined with the options you want the document to be converted with.
If you do not specify a dse
attribute for a document, ccBuild uses
the default DSE defined by default-dse-id
in makefile.dtd
.
The DSE architecture supports chaining. This means that one DSE intakes the
output provided by another DSE. When defining a dse
element, use
the chain attribute to specify another DSE it should chain to.
NXT 4 supports both Unicode and ANSI DSEs. Because of the differences in the DSE interfaces, chaining between the two interfaces is not recommended and in most situations will not work. A DSE can be written to handle the string translations itself, which would allow chaining between Unicode and ANSI DSEs but this does not occur automatically.
![]() |
The FSysDSE supplied with NXT 4 does not support chaining to other DSEs. |
The DSE is involved in the following sequence of events when building a content collection or update file.
Optionally, a DSE can preprocess the document by adding, changing, and removing
data or possibly even changing its format. The content-type
specified
for the document should be the final content-type
after the DSE
finishes.
The File System DSE (FSysDSE) imports documents using the file services provided by the operating system. FSysDSE reads files from disk and passes them to ccBuild or another DSE. The most common use of FSysDSE is to import graphics, HTML, Word, Excel, PowerPoint, PDF, and XML documents.
![]() |
The NXT 4 server supports importing XML documents by providing a server side translation of XML to HTML or DHTML, through a display filter for browsers that do not support display of XML. |
When using FSysDSE to store a document, a content collection makefile should specify the content-type corresponding to the source document's data format. Typically, that content-type also corresponds to the application or plug-in that will handle the document once it reaches the browser.
FSysDSE stores the document in its native form. The indexer uses some internal filters to extract the text from document types such as PDF, Word, Excel, PowerPoint, WordPerfect, HTML, Text, and XML. The extracted text is used to index the documents, but the original document is stored in its native format.
In addition to the FSysDSE shipped with NXT 4, you can create other DSEs
using the DSE API. For more information, see the documentation that is included
with NXT 4 Builder (../Rocket/NXT 4/Builder/dseapi/dseapi.nxt
).
By default, this collection is not installed. You will need to use Content
Network Manager to mount the collection on your default site or wherever you
want to have it accessible.
Copyright © 2006-2023, Rocket Software, Inc. All rights reserved.